Particle swarm optimization for generating interpretable fuzzy reinforcement learning policies

نویسندگان

  • Daniel Hein
  • Alexander Hentschel
  • Thomas A. Runkler
  • Steffen Udluft
چکیده

Fuzzy controllers are efficient and interpretable system controllers for continuous state and action spaces. To date, such controllers have been constructed manually or trained automatically either using expert-generated problem-specific cost functions or incorporating detailed knowledge about the optimal control strategy. Both requirements for automatic training processes are not found in most real-world reinforcement learning (RL) problems. In such applications, online learning is often prohibited for safety reasons because it requires exploration of the problem’s dynamics during policy training. We introduce a fuzzy particle swarm reinforcement learning (FPSRL) approach that can construct fuzzy RL policies solely by training parameters on world models that simulate real system dynamics. These world models are created by employing an autonomous machine learning technique that uses previously generated transition samples of a real system. To the best of our knowledge, this approach is the first to relate self-organizing fuzzy controllers to model-based batch RL. FPSRL is intended to solve problems in domains where online learning is prohibited, system dynamics are relatively easy to model from previously generated default policy transition samples, and it is expected that a relatively easily interpretable control policy exists. The efficiency of the proposed approach with problems from such domains is demonstrated using three standard RL benchmarks, i.e., mountain car, cart-pole balancing, and cart-pole swing-up. Our experimental results demonstrate high-performing, interpretable fuzzy policies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Q-Value Based Particle Swarm Optimization for Reinforcement Neuro- Fuzzy System Design

This paper proposes a combination of particle swarm optimization (PSO) and Q-value based safe reinforcement learning scheme for neuro-fuzzy systems (NFS). The proposed Q-value based particle swarm optimization (QPSO) fulfills PSO-based NFS with reinforcement learning; that is, it provides PSO-based NFS an alternative to learn optimal control policies under environments where only weak reinforce...

متن کامل

Enhanced Comprehensive Learning Cooperative Particle Swarm Optimization with Fuzzy Inertia Weight (ECLCFPSO-IW)

So far various methods for optimization presented and one of most popular of them are optimization algorithms based on swarm intelligence and also one of most successful of them is Particle Swarm Optimization (PSO). Prior some efforts by applying fuzzy logic for improving defects of PSO such as trapping in local optimums and early convergence has been done. Moreover to overcome the problem of i...

متن کامل

Reinforcement Learning with Particle Swarm Optimization Policy (PSO-P) in Continuous State and Action Spaces

This article introduces a model-based reinforcement learning (RL) approach for continuous state and action spaces. While most RL methods try to find closed-form policies, the approach taken here employs numerical on-line optimization of control action sequences. First, a general method for reformulating RL problems as optimization tasks is provided. Subsequently, Particle Swarm Optimization (PS...

متن کامل

Hybrid Stages Particle Swarm Optimization Learning Fuzzy Modeling Systems Design

An innovative hybrid stages particle swarm optimization (HSPSO) learning method, contains fuzzy c-mean (FCM) clustering, particle swarm optimization (PSO) and recursive least-squares, is developed to generate evolutional fuzzy modeling systems to approach three different nonlinear functions. In spite of the adaptive ability of PSO algorithm, its training result is not desirable for the reason o...

متن کامل

S3PSO: Students’ Performance Prediction Based on Particle Swarm Optimization

Nowadays, new methods are required to take advantage of the rich and extensive gold mine of data given the vast content of data particularly created by educational systems. Data mining algorithms have been used in educational systems especially e-learning systems due to the broad usage of these systems. Providing a model to predict final student results in educational course is a reason for usi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Eng. Appl. of AI

دوره 65  شماره 

صفحات  -

تاریخ انتشار 2017